AITopics | video representation

Collaborating Authors

video representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Natural vs Ultrasound Video Normal Adult Heart

Neural Information Processing SystemsJun-17-2026, 11:21:55 GMT

Self-supervised learning (SSL) has achieved major advances in natural images and video understanding, but challenges remain in domains like echocardiography (heart ultrasound) due to subtle anatomical structures, complex temporal dynamics, and the current lack of domain-specific pre-trained models. Existing SSL approaches such as contrastive, masked modeling, and clustering-based methods struggle with high intersample similarity, sensitivity to low PSNR inputs common in ultrasound, or aggressive augmentations that distort clinically relevant features.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Neural Information Processing SystemsJun-13-2026, 03:06:02 GMT

Our objective is to develop compact video representations that are sensitive to visual change over time. To measure such time-sensitivity, we introduce a new task: chiral action recognition, where one needs to distinguish between a pair of temporally opposite actions, such as "opening vs. closing a door, "approaching vs. moving away from something, "folding vs. unfolding paper, etc. Such actions (i) occur frequently in everyday life, (ii) require understanding of simple visual change over time (in object state, size, spatial position, count . . .

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation

Neural Information Processing SystemsJun-12-2026, 07:20:35 GMT

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model

Neural Information Processing SystemsApr-29-2026, 14:26:37 GMT

Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities.In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grouping mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments.Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query.Project page: https://uark-aicv.github.io/HENASY

artificial intelligence, natural language, proceedings, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Unsupervised Learning of View-invariant Action Representations

Neural Information Processing SystemsMar-16-2026, 19:33:07 GMT

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose an unsupervised learning framework, which exploits unlabeled data to learn video representations. Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view. By learning to extrapolate cross-view motions, the representation can capture view-invariant motion dynamics which is discriminative for the action. In addition, we propose a view-adversarial training method to enhance learning of view-invariant features. We demonstrate the effectiveness of the learned representations for action recognition on multiple datasets.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

9d66f74820f11ce037fb5f711ab9acd4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 23:49:49 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Arkansas > Washington County > Fayetteville (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Jinwoo Choi, Chen Gao, Joseph C. E. Messou, Jia-Bin Huang

Neural Information Processing SystemsFeb-13-2026, 12:55:57 GMT

Such biases are known asrepresentation bias [38].

artificial intelligence, incvpr, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unsupervised Learning of View-invariant Action Representations

Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

Neural Information Processing SystemsFeb-12-2026, 13:18:16 GMT

Neural Information Processing Systems http://nips.cc/

action recognition, recognition, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore > Central Region > Singapore (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)

Add feedback

Appendices

Neural Information Processing SystemsFeb-11-2026, 04:30:33 GMT

Note thatppos is task-specific; here we use the class oracle,i.e. the ImageNet-100 labels,todefinethepositivesamples. In Figure 1, we plot theproxy task performance, i.e. the percentage of queries where the key is ranked over all negatives, across training for MoCo [19], MoCo-v2 [10] and some variants inbetween. As mentioned above, all results in Figure1areforthesameτ =0.2. Ablations showed that this yields at best performance as good as mixingwiththequery,butonaverageabout0.1-0.2%lower. This weighing scheme also resulted in slightly inferior results.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Neural Information Processing Systems

Country: